Assessment of Stability in Partitional Clustering Using Resampling Techniques
نویسنده
چکیده
The assessment of stability in cluster analysis is strongly related to the main difficult problem of determining the number of clusters present in the data. The latter is subject of many investigations and papers considering different resampling techniques as practical tools. In this paper, we consider non-parametric resampling from the empirical distribution of a given dataset in order to investigate the stability of results of partitional clustering. In detail, we investigate here only the very popular K-means method. The estimation of the sampling distribution of the adjusted Rand index (ARI) and the averaged Jaccard index seems to be the most general way to do this. In addition, we compare bootstrapping with different subsampling schemes (i.e., with different cardinality of the drawn samples) with respect to their performance in finding the true number of clusters for both synthetic and real data.
منابع مشابه
A Detailed Study and Analysis of different Partitional Data Clustering Techniques
The concept of Data Clustering is considered to be very significant in various application areas like text mining, fraud detection, health care, image processing, bioinformatics etc. Due to its application in a variety of domains, various techniques are presented by many research domains in the literature. Data Clustering is one of the important tasks that make up Data Mining. Clustering can be...
متن کاملA Comparative Study of Clustering Methods with Multinomial Distribution
In this paper, we study different discrete data clustering methods, which use the Model-Based Clustering (MBC) framework with the Multinomial distribution. Our study comprises several relevant issues, such as initialization, model estimation and model selection. Additionally, we propose a novel MBC method by efficiently combining the partitional and hierarchical clustering techniques. We conduc...
متن کاملC ONSTRAINT BASED P ARTITIONAL C LUSTERING – A C OMPREHENSIVE S TUDY AND A NALYSIS Aparna
Data clustering is the concept of forming predefined number of clusters where the data points within each cluster are very similar to each other and the data points between clusters are dissimilar to each other. The concept of clustering is widely used in various domains like bioinformatics, medical data, imaging, marketing study and crime analysis. The popular types of clustering techniques ar...
متن کاملPre Processing Techniques for Arabic Documents Clustering
Clustering of text documents is an important technique for documents retrieval. It aims to organize documents into meaningful groups or clusters. Preprocessing text plays a main role in enhancing clustering process of Arabic documents. This research examines and compares text preprocessing techniques in Arabic document clustering. It also studies effectiveness of text preprocessing techniques: ...
متن کاملInformal version for personal use Scalable Clustering
2 Clustering Techniques: A Brief Survey 4 2.1 Partitional Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Hierarchical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 Discriminative vs. Generative Models . . . . . . . . . . . . . . . . . 12 2.4 Assessment of Results . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4.1 Internal (model-based, unsup...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017